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-ABSTRACT •'. _ . _ . 

Several criteria for assessing, bias in educational 

tests are presented a'nd discussed. These criteria were developed in 

accordance with basic notions of fairness, equality, and expanded *• 

life options for women. In terms o'l prescriptions for test 

developers, the criteria are: (1) tests should be constructed of 

items which contain either no sex references or equal sex references; 

(2) status of males and females witfcin' the test should be equal; (3) 

item content should not reinforce traditional sex stereotypes. Tests 

currently in use may be considered biased if: (4) item content in 

terms of male^or female statuses or stereotypes affects the 

performance of males or females differentially; (5) the test predicts 

differentially for males and females; (6) the test is, normed 

separately for males and females unless separate norms are Used to 

insure balance in selection; (7) *the test is constructed so that 

female futures may be separated from male futures. (Author/RC) 
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Sex Bias in Educational Testing: A Sociologists Perspective 

• • • » 

Marlaine Lockheed-Katz • -v 
v "**• Educational Testing. Service ^ " 

The purpose of education, and. hence of educational testing, should 
be. to. expand the life options of individuals. In too many cases, however, 

the life options available to women are limited and rigidly "defined. In_ 

an effort to improve the status of women" internationally , the United 
Nations General Assembly adopted a "Declaration on the Elimination of 
Discrimination Against Women" in 1967>. This document states that diacriminatioi 

r » 

against women is fundamentally unjust and inconsistent with human dignity; 
it calls for the abolition o'i existing lav^, customs, regulations and 
practices which are discriminatory against women, and firmly establishes 
the principle of equal access to education for women. 

Present educational practices, however, still reflect commonly held . 
• beliefs about fundamental differences between the sexes. Thus schools may 

4 % 

be segregated by sex, offer different curricula for males and females, provide 
fewer "resources for female activities than f S r male activities, and prepare 

females for different careers from males. -« , 

•' --> % : < " 

One major adjunct to education is educational testing* yet on several 

dimensions educational tests appear to reinforce beliefs about the differences 

between, the sexes, A review of major educational tesjts suggests that men and 

women are not presented equally in these tests. Unequal distribution of items 

between male and females, stereotyped images of both men and women* and 

separate interpretations, of test results for men and women ate characteristic 

cf such tests/ SUch inequities may restrict the life options of women. 



The purpose of this paper is to present criteria for the construction 
and evaluation of teats which expand the options' of men and women. These 
criteria may be applied to evaluate tests "administered to any heterogeneous 

u * 

population. m ^ • 

There*are seven criteria against which a test may be judged for bias.. 

1. the actual distribution of test items dealing with male and 
female actors v 

2. the status of the males and females within the items 

3. the content of i/ems relative to traditional or stereotyped male' 



or female interests or skills 

4. the effect of (1), (2), or (3) above on male or female success 
on any item or items or on the test as a whole 

5. the overall predictive validity of the test for males and females 
with respect to some criteria such as future grades and the. use 

' of tests for selection based upon such prediction 

6. the use of separate norms for evaluating the test performance of 
males and females 

7. the uses made by counselors and others to predict future 
._accupationa» interests or .ski lls of males.,^nA-fe r ma,les, as a result__ 

of their test performance, when such predictions separate male 
futures from female futures. 
Each of these criteria will be considered in this paper. 
1. The distribution of items dealing with male and female actors- 
In a 19/3 Study funded by the Ford Foundation, Tittle investigated the 
occurrence of male and female references in 9 different series of tests of 
academic achievement used for American students from kindergarten through 



12th ^rade. Overall, 29 separate tests were examined. Within each test, 

all references. to males and females were counted. The generic use of the 

• • - . s 
word "man" and' "he" was counted separately. * 

Of the tests scored, all but one contained a higher number of male 
references than female references. The ratio ranged from 14-1 to 
slightly' less than 1-1. Only 8 of the 29 test batteries examined had 
less than a 2-1 ratio of male references to female references. The older 
the age group for'whom the test was written, the hig»*< the male/female 
ratio* 

Lockheed-Katz (1973) conducted a similar investigation of male and 

* 

female references, by item, in eight major college and graduate school' entrance 
examinations. Similar imbalance was found^within these tests. Items were 
coded according to whether' they contained no sex reference, male only sex 
reference, female only sex reference or both, mke and .female sex, reference. 
The ratio 'of male only items to female only items ranged from 16-1 to 2-1. 
Only 22 of the 1220 items coded contained references to both men and 
women. Sevetfty-f ive percent of all the items contained no sex reference, 
but of the remaining* 25% more, than 4/5 were "male only items." 

I*f testa are to be constructed in such a way as tb reflect an 
^implicit egalitarian ireoIogyT then eitheT^rex-refereuced i'tenur -should" be 
entirely eliminated or a balance between items dealing with male actors 
and female actors should be achieved. 

2. ,A second criterion for judging a test for bias is to determine 
how the male and female actors are portrayed within it ems.- 

* * 

* . In her study of sex-role stereotyping in achievement tests, Tittle 
reported that "wpmen are portrayed almost exclusively as homemakers or in 
the pursuit of hobbies." Furthermore she reported that "some items imp^y 
that the majority of professions are closed to^women." 
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• In the investigation of the college and graduate school admissions 
tests conducted by Lockheed-Katz , the relative status of males and females in -a 
single item was coded. Of the 22 items (out of 1220) which contained 
both a male and a female actor, 10 items portrayed the men. and women 
as equal in status; the remaining 12 items showed_the men as being higher 
status than the women. No item on any of the eight tests portrayed a 
woman in a higher status position than a man, for example, as a female 
principal with a male teacher or a female lawyer with a male client. 

Typically females were referred to as mothers, teachers, secretaries 
or wives. Males were referred to as lawyers, managers, principals, 
superintendents,- doctors or other professionals.' Since it is both the 
case that there are now women in high status positions relative to men and 
that trup equality of opportunity implies that men and women should have 
equal access to both high and low status occupations, tests should also 

- V • 

reflect this equality. 

3. A third criterion against which to judge a test for bias is the 
distribution of items relative to traditional or< stereotyped male or 
female interests or skills. In American testing this general area of 
concern, which may be cons i dered the cultural relevance of a test , has 



been centered about the issue of minority representation on tests. Thus 
studies by Quirk and Medley (1972), Linn (1973) and others suggest that 
American tests have been culturally specific and may be inappropriate 
for use with non-Anglo populations . s ° ; 

* The same may be said with respect to the ftale-ness of tesXs and items. 
Unfortunately, the test constructor may face a dilemma: to include items 
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which reflect "female" interests may imply the re increment of sex-role , 
stereotypes. That is, to include kitchen measurement items .in a math 
te.s.t may make the item more manageable for females at the cost of % 
reinforcing the stereotype which says that woman's place is in the home. 

It is possible to construct items in which the actor and the actions 
are not stereotypically associated. ,Such items might include a boy . 
measuring flour in the kitchen or a girl fixing a bicycle. The culturally 

V 

specific item is retained, but the stereotype is broken by substituting a 
different actor than would beWected. No studies have been found which 
report any but extreme sex role stereotypes'- in test items. 

The preceding three criteria against which to judge a test 
-test items to be balanced with. respect ■ ale and female references 
- —the relative status of males and femaK within items to be balanced 
equal 

—cultural interests of males and females to be balanced with 

* \ * : • • 

reinforcing cultural stereotypes 
are based upon the layman's notion of fairness, which implies an equal 
representation of conflicting or divergent interests or likes. If a test 

is biased accoruiAfe to Lhes e criteria, it is not Intri nsically fair as i t 

does.not represent males and females equally, whether or not the test 
discriminates between male and female test takers.. 

4. Another criterion for assessing bias is the ability of either items 
or a total test to discriminate between males and females. In other words, 
what effect do the three previously mentioned biases have on the performance 
of males and females on tests?. . ^ 

At present, there is little reported that answers this question. Some 
studies report analyses of these issues in connection with ethnic diversity.^ 



v-6- 



4\ 



Echternacht, Carlson and ^laugher (1973) described three studies 
examining differences in item difficulty for M> lack and white test takers. 
They describe alternative strategies for assessing test bias. The first 
strategy was to regress the A. or inde- of ' item difficulty, for blacks 
or whites for each item type on a give:, ±st. -This allowed the 
"investigatorsto determine if certain types of items were more difficult 

c * 

for blacks than for whites. ^ 

A second approach employed by Echternacht, Carlson and Flaugher was 

1 ..... % • . 

to determine if, for any pair of items, the easier of the items for the black / 
test takers was the harder for the white and vice versa*' By using this paired 
'comparison, a test could be constructed that would be equally difficult for 
blacks and whites, although : the difficulty would differ at the item level. 



A third strategy suggested was to correlate blrpk and white item 
responses to items, producing average within race and cross race 
correlations and to compute ANOVA on these correlations to determine if 
item-race interactions existed* • H 

Echternacht (1972) reported a study of item-sex interaction, using 
the first method described above/ that found 3 items out of 30 in the 

.J . . — * : — — ■ ■ 



Aptitude Test for Graduate School of Business which showed differences 
between males and females in response; no information about the nature*' of 

the items w^s included in the report, however* 

♦ 

Another study of total test performance for men vs. women was conducted 

© ■ 

by Swioeford (1972) on the Law t School Admissions Test, Two kinds of items 
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'showed differences^ in performance. Women performed better on verbal items 

• / . ■ > * 

while men performed better on Data Interpretation sections. No attempt 

was made to examine the items Individually. „ . , 

>' • * 

* Coffmart (1961) examined items in an aptitude test and made predictions 

"with- respect to the content of the. items as to which items, would favor men , • 

' *> and which would favor women. These items vere judged for traditional 

* * ' * 

* interest or skills of men and women. Of the 16 judgments, 14 were in the. . 

predicted direction. Of' the nine items which involved mechanical knowledge, 

science or business "and. were* judged to be easier for men, eight actually * 

were easier for male respondents. Qf the' ten items which involved personal 

- feelings or personality characteristics, and were judged easie r for \yomen, 

* o • 

nine were actually easier for women. ' 

Donloi^ (1971) reported' a study patterned after Coffman's in which the 
scores of the 103,275 persons who took the Scholastic Aptitude Test in 
May 1964 were examined for sex differences by item. Items for which the 
women's performance was statistically superior to the men's* and items 
' for which the .men's performance was superior to the women's were located 
" and examined "for sex relate^ bias. Of the 90 verbal items on the test, 8 
items' favored men and 11 favored^ women. Seven of the eight items favoring 
-— male s were coded as having scientific or, "practica l affairs " contra; 8 of 

the 11 items favoring females were coded as having human relations, 
t .-*'**• * • 

K\ humanities or aesthetic-philosophical 'content. VThe content of these 

' r> items, relating to stereotyped br culturally associated interests of men 
" . .and women, apparently accounted for the differences in male and female 
performances on the item. 
" ' . , Milton (1958) reported five studies in which differences between males 

and females in problem solving were explained by the sex-role stereotype of. 



ERLC 



•theyproblem. Hilton documented and replicated the' finding that when 

% * 

• # - 

problems were framed so<as to make them less appropriate to the masculine 
role, sex differences in problem solving were reduced. 

• • . • * • * 

* 5. A fifth criterion for judging whether a test is biased relates to. 
, its overall predictive Validity with regard to an externa! criteria, such 
as future grades. Typically this sort of test analysis is used for selection 
of test takers into college or occupation. In fact, it appears difficult 
to separate the Issues of .predictive validity of a test from the use of the- 

t 

test for selection purposes. Tittle's review of test predictive validity 
begins by summarizing research, conducted on black-white and male- female 
Pl^dictions of t college grades from Scholastic Aptitude Test scores, and 



,e{ids by reviewing models of ^election- bias. 4 

Seashore (1962) summarized* several studies which conclude that women's 
grades *are "more accurately predict^ by tests than are men's. These 
^conclusion's' are reiterated by Cole (1$73) who analyzed data fromjstudents 
enrolled in 19 American coeducational colleges; he found that standard 



test and high school grades better predict^ women's first term grades than 

- •■ • ' \ : 

men's first term grades. \ 

The main rationale- for examining the predictive, validity of a test, 

however, is. for selection purposes. Selection is itself subject to bias. 



Cole (1972) distinguishes six models of selection bias and applies 

« 

these models to male-female selection. A single set of data is analyzed 
according to the six models, and judgments' about the fairness of the 

selection practice are made basefd upon an analysis of the data.- Thf six 

• • 
models of selection bias are:- 

' * 1. The Regression model, where test bias refers to either over- , 
prediction or underprediction of criterion measures for different 
populations using a single regression equation. 
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2. The quota model,' where selection is made according to the percent 
distribution of each subgroup.. within the total population. 

3. The 'subjective* regression model, in which a constant is added 
.... . , , * v 

to minority group scores to increase the probability of their . 

selection according to the regression model. . .. 

4. The equal risk model, where all persons who have the same 
probability of being successful on a, criterion measure are 
selected. 

* 

5. The constant ratio model, where selection by group is made in 
proportion to that group's success on the test. 

6. The conditional probability model, in which the probability of 
being selected is contingent upon achieving a satisfactory 
criterion score and is not related to group membership. 

Applying these models to the selection of men and women into college, 
ACT (1973) reported that using separate regression equations in the regression 
models is fair, but that combined equations are biased against women. Linn 
- (1973) also reported a similar finding with regard to black and white male 
and female students accepted at 22 different college?. That is, women's 
achievement . is underpredicted using a regression equation based upon all . 

male or combined data. 

The o'ther models of selection bias applied-by Cole to the 19 school 
data revealed different patterns of fairness to men and women. 

The quota model is frequently preferred for the selection of men and 
women > it permits the selection of fewer women than would qualify under 
the regression model. 
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* Bo.th the constant ratio and the conditional probability models %ete 

•* 

foundf to be unfair to men, while tfie equal risk model was judged to be 

v f 

fair but impractical; 0 

6. A sixth criterion for judging bias in a test centers about the 
implicit assumptions associated with reporting separate norms. When 
norms on a, test are presented separately* for males arid females, aS they 
are for many tests, it reinforces the implicit belief th^t male test m 
•performance and female test performance should be different. The reinforcement 

pf beliefs regarding male and female abilities by separate norming is an 
. issue which is distinct from the issue regarding the. use of norms for 
selection purposes. 

AithouglfThe normal distribution of test* scores may at present; be , ^ 
different for males and females^ there is some evidence that cultural 
rather than genetic influences account for these differences. 
7 s For ex- 1 ^ple a. study by Fremer; Coffman, and Taylor (1968), "The College 

* * • ' * • 

Board Scholastic Aptitude Test as a predictor of academic achievement in 
secondary schools in England/' showed that while American freshman girls . 
score higher on verbal than on mathematical aptitude, British girls of the 
same age score higher on the mathematical than the verbal tests* "Peck 
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(19^71) also reported no notable systematic sex differences in performance 
on aptitude and achievement tests across eight different countries. He 
attributed such differences as occur to cultural differences* 

.When norms are used for selection purposes, however , it may be that 
separate norms will promote more equal itdirian representation of males and 
femaj.es* in the roles for which thgy are being selected.. Thus, selecting on 
the basis of the top 10 percent of the males and the top 10 percent of the 
) females will yield a selected group balanced by sex, while selecting on the 



1 



•top 10 percent of a combined group of mahs W females will yield a selected 
.group whic.h'ls imbalanced toward one or the other sex if the norms of the two 
groups are actually different. " 7 

• The issue of separate nooning is consequently complicated. While 
reporting separate norms may ^reinforce stereotyped blliefs regarding Vie and 
female 'abilities, failing to'do so may reduce the likelihood of equal male 
or female selection* * '• I *l mwi \ 

7. The final criterion for evaluating a test to\ bias £s the extent 
to which.it may be used to separate male futures fromlfemale futures. . 
This bias- is most clearly Observed in vocational interest tests which 
topically separate male and female interests.. 

For example, Tittle reported that the Strong-Campbell Interest Inventory, 
a unisex modification of the. Strong Vocational Interest blank, contains 
occupations identified by the. letter "m" or "f" for maid or female. 

Although physician is listed as an occupation for both inkles and females, 

» 

biologist, cartographer, social scientist, architect, minister, school 

* 

superintendent and sales manager are identified only as $iale occupations. 
The Kuder Occupational Interest Survey contains even more blatant 
distinctions bJtweenvmale arid female occupational and educational 
interestsT^Tittle reported of the 77 male occupational Scales and the 
57 female occupational scales, there are only 16 scales fpr identically 

stated occupations for males and females. College major scales also 

: t- , ; 

reflect separate and unequal' opportunities for men and, women. The 

* historical fact that women did not. have, or were not allowed to have, 

certain interests is surely no excuse to discourage future generations § . 

of wqaien^from considering ^these interests. 



This paper has attempted to present briefly several criteria for 
assessing bias in educational tests. These criteria were developed in . 
accordance with basic notions of fairness, equality and expanded life 
options for women. To summarize these criteria in terms of prescriptions 
for test 'developers, they are: 

1. tests should be constructed of items which contain either no sex 
references or which are balanced for male and female references 

2. the status of the males and females within the test should be 
equal » 

3. the content of items should not reinforce traditional or 

stereotyped images of men and women. 
, Tests which are currently in use may be considered biased if: 

4. the content of the items in terms of male or female statuses 

or stereotypes effects "the performance of males or females 

* 

differentially 

5. • the test predicts differentially for males and females 

6. " the test is normed separately for males and. females "unless 

separate norms are, used to insure balance in selection 

7. the test is constructed so that female futures may be 
separated from male futures. * 

The principle of fairness which calls for eliminating discrimination 

against women and providing women with equal aqcess to education requires 

n 

that all aspects of education be free from discriminatory material. This 
requirement applies to educational tests in particular, as tests of ' 
achievement and aptitude typically determine both men and women's access 
to future education. 
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